docs: add Workload Identity guide#855
Conversation
Document the workload identity feature for sandboxes: the identity-token file and on-demand token server APIs, the injected environment variables and token claims, use cases (AWS STS federation, GCP/Azure, internal service-to-service), external token verification, and the sharp edges (fixed refresh loop, internal token-server address, distributed-runner coordinator issuance, restart secret persistence, operator-driven key rotation). Wire the new page into the Networking & Security sidebar.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughThis pull request adds a new documentation page describing Miren's Workload Identity feature, which allows sandboxes to authenticate to external services using short-lived signed OIDC identity tokens. The documentation explains Miren as an OIDC issuer, two token acquisition methods (file-based and HTTP endpoint), the JWT claims and structured Comment |
There was a problem hiding this comment.
Review: docs: add Workload Identity guide
I genuinely enjoyed reading this. It's a well-structured, thoughtful guide for a nuanced security feature, and the author has clearly thought hard about what users need to know before they get tripped up. The opening callout distinguishing "Workload Identity" from "CI/CD OIDC" is a great orientation — exactly the kind of confusion that bites newcomers. The "Sharp Edges & Limitations" section is thorough and honest, especially the guidance around not caching the token file and the behavior differences under distributed runners.
The sidebar placement under Networking & Security feels right, the cross-links to /ci-deploy and /labs are consistent with other docs in the repo, and the Docusaurus front-matter (title, description, keywords) is complete and correct.
One factual inconsistency worth fixing before merge
The sub claim format is described in two places that don't quite agree:
Line 115 — the template:
org:<organization_id>:app:<app>:sandbox:<sandbox_id>
Line 123 — the example JSON payload:
"sub": "org:acme:app:web:sandbox:sb_01HX..."Here web is the app, and sb_01HX... is the sandbox_id — that looks consistent with the template.
Line 132 (also in the example JSON) — the sandbox_id custom claim:
"sandbox_id": "sandbox/demo-web-xxyyzz"The sub example uses sb_01HX... as the sandbox_id value, while the separate sandbox_id claim shows sandbox/demo-web-xxyyzz. These appear to be from two different fixture datasets mixed together, which could confuse a reader trying to map the sub value back to the individual claims. Ideally, the JSON example should be internally consistent: the value in sub for the sandbox should match sandbox_id, and the app in sub should match app. A small fix to unify the example payload would remove any ambiguity.
Minor notes
- Line 59 vs. line 205 — the token file refresh interval is described as "roughly every 45 minutes" in both places, which is helpfully consistent. Just worth keeping in sync if the interval ever changes.
- GCP and Azure section (line 163) — deliberately lighter on detail than the AWS section, which is fine. A follow-up issue or TODO noting that provider-specific guides are wanted would be a nice signal to future contributors. Not a blocker.
- The document correctly ends without a trailing newline in the last line of the "Key rotation" section — fine for Docusaurus.
Bottom line
This is very close to merge-ready. The only thing I'd flag before merging is the inconsistent sub / sandbox_id values in the example JSON payload (lines 123 vs 132). It's a documentation doc, not code, so the blast radius is low — but a user building an AWS trust policy based on the example could write the wrong StringEquals condition if they copy the wrong value. A one-line fix to harmonise the example makes this solid.
| "organization_id": "org-demo-xyz", | ||
| "cluster_id": "cluster-aabbcc", | ||
| "app": "demo", | ||
| "sandbox_id": "sandbox/demo-web-xxyyzz" |
There was a problem hiding this comment.
The sandbox_id claim value here ("sandbox/demo-web-xxyyzz") doesn't match the sub claim on line 123, which uses sb_01HX... as the sandbox_id component. Both values appear to come from different fixture datasets. Please unify the example so the sandbox portion of sub matches the standalone sandbox_id claim — readers who cross-reference the two fields to write AWS trust-policy conditions will otherwise be confused about which format to expect.
|
|
||
| ### GCP and Azure | ||
|
|
||
| Both Google Cloud and Azure support OIDC-based **workload identity federation**. Configure a workload identity pool / federated credential that trusts your cluster's issuer URL and matches on the token's subject or audience, then exchange the Miren token for cloud credentials using each provider's federation flow. The mechanics differ per provider, but the trust relationship is the same: they verify the token against your cluster's JWKS. |
There was a problem hiding this comment.
The GCP and Azure section is intentionally high-level compared to the detailed AWS walkthrough, which is a reasonable editorial choice. Consider opening a follow-up issue (or adding a brief TODO comment here in the source) to flesh out provider-specific examples for both clouds — it would make this guide complete for users who don't use AWS.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/docs/workload-identity.md (1)
114-114: 💤 Low valueConsider adding language specifiers to fenced code blocks.
Three code blocks lack language specifiers, which triggers markdownlint warnings. While these blocks contain URL paths and format templates rather than executable code, adding the
textidentifier would satisfy the linter and improve consistency.📝 Proposed fixes
Line 114:
-``` +```text org:<organization_id>:app:<app>:sandbox:<sandbox_id>Line 179: ```diff -``` +```text <issuer>/.well-known/openid-configurationLine 187: ```diff -``` +```text <issuer>/.well-known/miren/jwks</details> Also applies to: 179-179, 187-187 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@docs/docs/workload-identity.mdat line 114, Three fenced code blocks
containing non-executable text (the org/app sandbox template and two issuer URL
examples) are missing language specifiers; update each fenced block to use the
text language identifier by changing the opening fences to ```text for the
blocks containing "org:<organization_id>:app::sandbox:<sandbox_id>",
"/.well-known/openid-configuration", and
"/.well-known/miren/jwks" so markdownlint warnings are resolved and
formatting is consistent.</details> <!-- cr-comment:v1:e71232c2d0a00d2ad741d4fe --> _Source: Linters/SAST tools_ </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.Inline comments:
In@docs/docs/workload-identity.md:
- Line 83: The doc line about
audienceis incorrect: the implementation sets
JWTaudto []string{"miren"} whenaudienceis omitted (tests assert
claims.Audience == ["miren"]), so either (a) update
docs/docs/workload-identity.md to state that the default audience is "miren"
(replace “defaults to the cluster issuer/issuer” with the actual default) or (b)
if the intended default should be the cluster issuer, change the implementation
that sets the default audience (where JWT claims are populated and
claims.Audience is set) to use the cluster issuer instead of []string{"miren"}
and update tests accordingly; choose one approach and make consistent changes to
theaudiencedocs, the code that sets claims.Audience, and any tests that
assert claims.Audience.
Nitpick comments:
In@docs/docs/workload-identity.md:
- Line 114: Three fenced code blocks containing non-executable text (the org/app
sandbox template and two issuer URL examples) are missing language specifiers;
update each fenced block to use the text language identifier by changing the
opening fences to ```text for the blocks containing
"org:<organization_id>:app::sandbox:<sandbox_id>",
"/.well-known/openid-configuration", and
"/.well-known/miren/jwks" so markdownlint warnings are resolved and
formatting is consistent.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro **Run ID**: `00ff9da3-335b-44ce-baac-ffa36161a1a0` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between c3e71c5f91122f2d65e4f66c8b807ce4cdda3594 and 3e1104c79665b3847f5b99b425bc0e45daf3a23a. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `docs/docs/workload-identity.md` * `docs/sidebars.ts` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
- Harmonize the example JWT so the sub claim matches the standalone organization_id, app, and sandbox_id values (CodeRabbit, P0). - Link to GCP and Azure workload identity federation docs in the GCP/Azure use case.
The token server defaults the aud claim to "miren" when no audience is requested (issuer.go IssueTokenWithOptions), not the cluster issuer. Update both the query-parameter description and the claims table to match.
There was a problem hiding this comment.
Re-review: docs: add Workload Identity guide
Happy to report that the one concern I flagged before merge has been properly resolved. Let me walk through what changed and where things stand now.
What was fixed ✅
The sub / sandbox_id inconsistency is gone. In the previous revision, the sub claim in the example JSON used sb_01HX... as the sandbox_id component while the standalone sandbox_id claim showed sandbox/demo-web-xxyyzz — two different fixture datasets mixed together. That was exactly the kind of thing that could trip up a reader writing an AWS trust policy StringEquals condition.
The current example JSON is now fully internally consistent:
"sub": "org:org-demo-xyz:app:demo:sandbox:sandbox/demo-web-xxyyzz",
...
"organization_id": "org-demo-xyz",
"app": "demo",
"sandbox_id": "sandbox/demo-web-xxyyzz"Every field maps cleanly to its corresponding component in sub. A reader can now trace the subject string directly back to the individual claims with no ambiguity.
The audience default — also clean ✅
CodeRabbit flagged that the previous docs said the default audience was "the cluster issuer" while the implementation used "miren". The current text on line 83 correctly states Defaults to \miren` if omitted, and line 104 (in the claims table) also says defaults to `miren`, or what you requested`. Consistent throughout.
Everything else still holds
The overall structure, the CI/CD OIDC distinction callout, the env-var table, the AWS walkthrough with external links for GCP/Azure, and the "Sharp Edges & Limitations" section are all as solid as they were in the first revision. Nothing has regressed.
This is clean, accurate, and ready to ship. 🎉
Summary
Adds a new documentation page for Workload Identity (MIR-1220), covering the feature where every Miren sandbox automatically receives a signed OIDC identity token for authenticating to external services without baked-in long-lived credentials.
The page documents:
/var/run/miren/identity-tokenfile, and the on-demand token server (GET $MIREN_IDENTITY_TOKEN_URLwith a per-sandbox bearer secret, optionalaudience/ttl). Includes the injected env-var table, request/error reference, claim table, and a sample decoded JWT.Wired into the Networking & Security section of the sidebar.
Test plan
cd docs && bun run build) — no broken links or MDX errors.